Experimental Research

PSCI 2270 - Week 10

Georgiy Syunyaev

Department of Political Science, Vanderbilt University

November 2, 2023

Plan for this week



  1. Project Updates

  2. Experiments and Their Types

  3. Example: Censorship in China

  4. Example: Conjoint and Audit

Any project updates?

Experiments

Recap on causal inference


  • What does “\(T_i\) causes \(Y_i\)” mean? \(\Rightarrow\) counterfactuals or what-if’s

    • Question: Would citizen \(i\) have supported gay marriage if they had been exposed to the LGBT community?
    • Hypothesis: Contact theory suggests that outgroup hostility diminished when people from different groups interact with one another
  • Two potential outcomes:

    • \(Y_i (1)\): would \(i\) have supported gay marriage if they had contact with a member of the LGBT community?
    • \(Y_i (0)\): would \(i\) have supported gay marriage if they didn’t have contact with a member of the LGBT community?
  • Causal effect: \(Y_i (1) − Y_i (0)\)

  • Fundamental problem of causal inference: Only one of the two potential outcomes is observable

Recap on causal inference (2)


  • We want to estimate the average causal effect among everyone: \(\frac{1}{n} \sum_{i = 1}^n Y_i (1) − Y_i (0)\)
  • What we can estimate instead: \[\text{Difference-in-means} = \frac{1}{n_T} \sum_{i = 1}^{n_T} Y_i (1) − \frac{1}{n_C} \sum_{i = 1}^{n_C} Y_i (0)\]

    • \(\frac{1}{n_T} \sum_{i = 1}^{n_T} Y_i (1)\): average support for gay marriage among those who had contact with member of the LGBT community (\(n_T\) number of people in this group)
    • \(\frac{1}{n_C} \sum_{i = 1}^{n_C} Y_i (0)\): average support for gay marriage among those who did not have contact with member of the LGBT community (\(n_C\) number of people in this group)
  • How do we ensure that the difference-in-means is a good estimate of the ATE?

Experiments to the rescue


  • Randomize!
  • Key idea: Randomization of the treatment makes the treatment and control groups “identical” on average

  • The two groups are expected to be similar in terms of all characteristics (both observed and unobserved)

    • Control group is similar to treatment group
    • Outcome in control group \(\approx\) what would have happened to treatment group if they did not receive treatment
    • vice versa
  • In the gay marriage example: Send canvassers to knock on people doors and randomly assign some households to canvassers who are members of LGBT community

Random assignment



  • A reproducible procedure that generates assignments with known probability between \(0\) and \(1\)

    • Coin flips
    • Selection from a deck of cards
    • Computerized random number generator
    sample(c(0,1), size = 20, prob = c(.5, .5), replace = TRUE)
     [1] 0 1 1 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 1 1
  • Note: As long as the probability is known it does not have to be equal! E.g. we can assign 30% to treatment and 70% to control

Treatment

  • The treatment is the variable that the researcher manipulates in an effort to study its effects on outcomes

    • Sometimes called the intervention or independent variable
    • Separates the study sample into experimental conditions: treatment group and comparison (control) group
  • For example:

    • In the gay marriage study the treatment is whether person has contact with member of LGBT community
    • In the legislator discrimination study the treatment is whether we send the email from black alias
    • Do you remember any other treatments in studies we discussed?
    • What is treatment in your class project?
  • Note: You can have more than one treatment group! E.g. some people are assigned to one version of treatment, some to the other and some to control

Outcomes



  • The outcome variable (also known as the dependent variable) is a quantity that may be influenced by a treatment

  • For example:

    • Contact with LGBT community member (the treatment) may be thought to reduce prejudice against gay marriage (the outcome)
    • Sending an email with black alias (the treatment) can be thought to increase chances of not receiving a reply (the outcome)
  • Note: A treatment might be thought to influence several outcomes!

Core assumptions


  • Random assignment of subjects to treatments

    • Implies that receiving the treatment is statistically independent of subjects’ potential outcomes
    • Ask yourself: Can you explain the randomization procedure so that others can verify it?
  • Non-interference (no spillovers): Subject’s potential outcomes reflect only whether they receive the treatment themselves

    • A subject’s potential outcomes are unaffected by how the treatments happened to be allocated
    • Ask yourself: Is it likely that one unit receiving your treatment will affect other unit’s outcomes?

Core assumptions



  • Excludability: Subject’s potential outcomes respond only to the defined treatment, not other extraneous factors that may be correlated with treatment

    • Importance of defining the treatment precisely and maintaining symmetry between treatment and control groups (e.g., through blinding)
    • Ask yourself: Are there parts of my treatment that are distinct from my core independent factor that can have effect on outcome?

Violations of key assumptions


  • Let’s split in groups and see how those core assumptions could be violated in gay marriage example?


  • Random assignment: Researchers selected the units by looking at their pre-treatment characteristics \(\Rightarrow\) Fail!

  • Non-interference: Study participants are neighbors and those who were contacted by LGBT community members talked to those who were not \(\Rightarrow\) Fail!

  • Excludability: LGBT community members who conducted interviews knew about study purposes and tried to make respondents more receptive to the survey \(\Rightarrow\) Fail!

Other issues: Conceptual treatments


  • What if your treatment is too abstract and is not represented by concrete intervention?

    • Sometimes the treatment is defined as an abstract concept, e.g., uncivil discourse in the context of a political debate
    • \(\Rightarrow\) There may be slippage between the actual and intended treatment
  • Solution: A manipulation check is an attempt to verify that the treatment received was akin to the treatment that the researcher intended to deploy

    • Is it the case that those exposed to putatively uncivil discourse rate it as more combative or confrontational than those exposed to civil discourse?
    • In the gay marriage example, what could be the slippage? How would we conduct manipulation check?

Other issues: Non-compliance


  • What if participants don’t receive the intended treatment?

    • A common problem in experiments that attempt to communicate with the treatment group; often, people cannot be reached
    • In gay marriage example: Some participants did not answer the door when canvasser came
  • Consider an extreme case in which no one in the treatment group receives the treatment

  • Solution: Try to collect the data on whether respondents received treatment/control conditions and assess effects among those who comply

Other issues: Experimenter effects


  • What if study participants guess what you are studying?

    • They can manipulate their behavior or attitudes to fit what they think you are studying
    • In gay marriage example: Respondents guessed that you are trying to improve their attitudes towards gay marriage
  • Solution: Combine your main outcome questions with questions about other topics and make sure that neither participants nor enumerators are aware of experimental nature of your study or what the treatment is (blinding)

Flavors of experiments


  • Lab experiments: Participants are invited into lab (in the university or in the field) to participate in the study

    • Allows researchers highest level of control over how the study is conducted
    • Allows interactions between study participants
    • The most artificial setting
  • Survey experiments: During the survey (online/phone/in-person) some respondents are assigned to receive certain questions/vignettes or the order of questions is randomized; the outcomes are measured during the same survey

    • Less pressure on participants and allows to reach larger population
    • Researchers have less control over how data is collected, especially in online surveys

Flavors of experiments



  • Field experiments: Researchers administer intervention in naturalistic and unobtrusive setting and measure outcomes later via surveys or administrative data

    • Least artificial setting since measurement is separated from the intervention
    • Allows to study populations that are hard or impossible to reach in survey or lab experiments (e.g. politicians)
    • Costly and logistically complex, but the payoff is high

Summarizing experimental design



  • What is the research hypothesis?

  • What are the experimental conditions?

  • Who (or what) are the subjects?

  • How are the subjects assigned to treatment(s)?

  • In what context does the experiment take place?

  • How are outcomes measured?

  • How do researchers estimate average treatment effect?

  • Are there any threats to inference? Think about random assignment, non-interference, excludability, experimenter effects, treatment meaning

Censorship in China

Censorship in China

  • “Reverse-Engineering Censorship in China: Randomized Experimentation and Participant Observation.” by King, Pan, and Roberts (2014)
  • Summary:

    • Experimental study on what is being censored on the social media platforms in China
    • Two competing theories: Anti-government statements vs. collective action statements
    • Conduct participant observation to provide qualitative evidence on how the censorship is structured
    • Created and posted on the internet many posts varying whether it contains collective action topics and whether they have pro- or anti-government statements
    • Find strong evidence in favor of censorship of collective action statements

What is the research hypothesis?


  • Why do autocrats want to do censorship?

    • To stay in power and prevent public from overthrowing them
  • How can this be achieved with censorship?

    1. By projecting positive image of the leader \(\Rightarrow\) censor anti-government discussions
    2. By making people believe that no one is dissastisfied with the government \(\Rightarrow\) censor collective action
  • Hypothesis: Authors claim that the collective action theory is true but not the theory about positive image of the government.

Experimental conditions? Subjects?


  • What is the treatment in the study? Two separate treatments

    1. Social media posts with collective action vs those without collective action potential
    2. Social media posts that take pro-government vs anti-government stance
  • Who (or what) are the subjects?

    • Not humans (!) but rather social media posts
  • Design: Factorial audit experiment

    • Factorial design: We have two separate treatments and we create experimental groups that have all possible (\(2^2\)) combinations of them: \(00\) (control), \(01\), \(10\), \(11\)
    • Audit experiment: Measure responsiveness and discrimination in bureacracy, government, other organizations. Units are not humans but some actions that should trigger a response (usually from government).

Treatment assignment and context


  • How did they assign the treatment?

    • Scraped the topics during the study and wrote the posts themselves
    • Randomly assigned posts during the study
  • What is the context of the study?

    • Selected 100 Chinese social media platforms and created users on them
    • 12 posts per social media with 1200 total sample size
  • Possible issues?

    • Excludability: How do they ensure posts do not contain anything besides the relevant pro-/anti-government information?
    • What is treatment: Are the topics they select actually representative of collective action posts?
    • Random assignment: Is the random procedure clear?

Outcomes? Analysis?


  • What are the outcomes they measure?

    1. Post selected for automated review
    2. Post censored (either after review or after publishing)
  • How do they analyze the data?

    • Look at differences in means for each treatment (collective action or no/pro- or anti-government) separately
    • This is the beauty of factorial designs!

Outcomes explained

Main results

Concerns? Ideas?



  • Should we be concerned about non-interference? Likely No!

  • Should we be concerned about experimenter effects? Yes!

  • Any other concerns?

  • Do you think audit experiments are useful? Let’s discuss where else we can use them in groups

Attitudes towards Immigrants

Attitudes towards Immigrants

  • “The Hidden American Immigration Consensus: A Conjoint Analysis of Attitudes Toward Immigrants.” by Hainmueller and Hopkins (2015)
  • Summary:

    • Experimental study on preferences of Americans for types of immigrants
    • Theories: Partisan, economic and sociotropic factors could affect preferences for immigrants
    • Conduct a study where respondents are asked to compare two immigrant profiles at a time
    • Find support for overall preferences fpr high-skilled and educated immigrants who plan to work, but no differences across partisan attachment

What is the research hypothesis?


  • There is a perception that Democrats and Republicans vary in terms of their preferences on immigration policy

    • Democrats are expected to be more accepting of immigrants regardless of their profiles
  • Alternative theories

    • Economic: Job threat as a factor that affects prefferences on immigration
    • Sociotropic: Preferences for immigrants who are more likely to contribute to the economy
    • Norms-based: Preferences for immigrants who are more likely to assimilate
    • Prejudice: Preferences for white/European immigrants

Experimental conditions? Subjects?


  • What is the treatment in the study? Many separate treatments

    • Prior trips; Reasons for application; Country of origin; Language skills; Profession; Job experience; Employment plans; Education level; Gender… 🤯
  • Who (or what) are the subjects?

    • Americans in nationally representative panel study
  • Design: Conjoint experiment

    • \(approx\) factorial design on steroids!
    • Borrowed from marketing research where it is used to test different product features. Allows for testing of large list of features
    • Ask people to rate or compare different profiles (not necessary of humans, e.g. policies/laws/events)

Treatment assignment and context


  • How did they assign the treatment?

    • Showed two profiles with randomly assigned features (every feature is randomized separately)
    • There is 900,000 possible profiles, so not all profiles are used…
    • Is that a problem? No, because for each feature in expectation the profiles on other features are the same
  • What is the context of the study?

    • Representative panel (i.e. interveiew each person more than once) survey in US
    • 1,714 completed first wave, 1,407 completed second wave
  • Possible issues?

    • Random assignment: Is the random procedure clear?
    • Conceptual treatment: There is a priming component in listing a feature in the table. Are we sure those factors are all relevant?

Outcomes? Analysis?



  • What are the outcomes they measure?

    1. Comparison between two profiles
    2. Rating of profiles on absolute scale
  • How do they analyze the data?

    • Look at differences in means for each feature (marginalized over other dimensions)
    • This is the beauty of factorial/conjoint designs!

Outcomes explained

Main results

Concerns? Ideas?



  • Should we be concerned about experimenter effects? Yes!

  • Should we be concerned about non-interference? Yes!

  • Should we be concerned about many tests that they run? Yes!

  • Any other concerns?

    • Too many profiles actually could be a problem if the other features are too sparce \(\Rightarrow\) ask each person to rate more than one pair of profiles
  • Do you think conjoint experiments are useful? Let’s discuss where else we can use them in groups

References

Hainmueller, Jens, and Daniel J. Hopkins. 2015. “The Hidden American Immigration Consensus: A Conjoint Analysis of Attitudes Toward Immigrants.” American Journal of Political Science 59 (3): 529–48. https://doi.org/10.1111/ajps.12138.
King, Gary, Jennifer Pan, and Margaret E. Roberts. 2014. “Reverse-Engineering Censorship in China: Randomized Experimentation and Participant Observation.” Science 345 (6199): 1251722. https://doi.org/10.1126/science.1251722.